Furley et al. (2016) recently reported that among the 100 most valuable soccer players, but not the 100 richest billionaires, individuals born in the first compared to the second half of the year were overrepresented (60 vs. 40%) and had higher estimated monetary value (EMV).
The idea of testing whether birthdate-related selection cut-off dates are associated with players' EMV is attractive. Implications for people advocating equal chances of high sporting achievement might be suggested, provided that evidence is substantive. Here I show that the data do not support the authors' interpretation that “relative age influences the value of the best soccer players.”
Instead of categorizing players into birth halves and thereby losing potentially meaningful information (examples of own work that can be legitimatelly criticized in this respect: Loffing et al., 2010; Schorer et al., 2015), correlational analysis could have been used to test for an association between month of birth and EMV. Neither, parametric (r = −0.067, p = 0.509) nor non-parametric tests (e.g., τb = −0.004, p = 0.955) indicate that EMV decreases as players are born later in the year (Figure (Figure1A1A)1.
Figure 1
(A) Soccer players' estimated monetary value as a function of players' month of birth (1 = January, …, 12 = December). The solid line illustrates the linear relationship between the two variables (r2 = 0.004; yEMV = −0.5946xMoB + 44.52). ...
The authors' conclusion “that birthdates (…) can actually result in higher monetary value” (p. 2) is based on a biased t-statistic2. Eighty-two players have an EMV below 50 million €, whereas three players have an EMV above 100 million € (see Figure Figure1B).1B). Lionel Messi (born on 24 June 1987) is the most valuable player (268.05 million €). Thus, among the most valuable players very few have very outstanding EMV. These “outliers” bias the t-statistic; e.g., there is a clear Lionel Messi effect. To illustrate, exclusion of Messi only, t(97) = 0.696, p = 0.488, d = 0.14 (−0.26, 0.54), or assuming him to be born just 1 week later on 1 July 1987 (i.e., second half of the year), t(98) = −0.432, p = 0.666, d = −0.09 (−0.48, 0.31), nullifies the authors' conclusion (e.g., see the range in 95% CIs for effect sizes calculated with Exploratory Software for Confidence Intervals; Cumming, 2012).
However, EMV are not distributed normally neither overall (e.g., Kolmogorov-Smirnov test: Z(100) = 0.275, p < 0.001) nor within groups [first half: Z(59) = 0.289, p < 0.001; second half: Z(41) = 0.213, p < 0.001; Figure Figure1B].1B]. To avoid that single cases bias interpretation (e.g., compare means and medians illustrated in Figure Figure1C),1C), a non-parametric test like Mann-Whitney-U seems a reasonable alternative. With Lionel Messi included, no group difference is found, U = 1191, z = −0.130, p = 0.897, r = −0.013. Importantly, even if there were a meaningful EMV difference between players born in the first vs. second half of the year this would not indicate “influence” of relative age on EMV, but only suggest an association between the two variables at best.
Another concern relates to the comparison of the proportion of players born in the first vs. second half of the year against an equal distribution. Relative age seems relevant to selection into professional soccer as reflected in a skewed birth distribution in the population of professional soccer players (e.g., Musch and Hay, 1999; Cobley et al., 2008; Helsen et al., 2012; Schorer et al., 2015). If we take a specific look on, say, the 100 most valuable players and want to test whether being born in the first vs. second half of the year alters the chances of being among the top 100, these players' birth distribution can a priori be expected to be skewed. Therefore, it should be tested against soccer professionals' birth distribution (57.53 vs. 42.47%; big-5 European Leagues in the season 2015/2016; see Supplementary Material online for details), not against a “uniform” distribution in the general population. Doing so reveals no relevant effect, χ(1,N=100)2= 0.09, p = 0.77, OR = 1.06 (0.61, 1.86). While exemplified here, the critique on reference values may similarly apply to other relative age research in sports as well (see Delorme and Champely, 2015, for details).
“Taken together, (…) broad implications that need to be taken seriously by political decision makers” (Furley et al., 2016, p. 2) should be based on proper study design and statistical methods in whatever domain. The above concerns illustrate that the data considered by Furley et al. (2016) do not legitimate their conclusions. A commentary on the study's limitations was necessary, first, to point out the importance of data inspection and critical assessment of the impact of individual cases on statistics, and second, to not suppress the discussion of a potential “underdog”-effect, which suggests that, in the long run, players born relatively later in the year may even have an advantage, or no disadvantage at least, in adult elite sporting competition (e.g., Ashworth and Heyndels, 2007; Schorer et al., 2009; Gibbs et al., 2012). Given the latter aspect, a directed hypothesis in favor of players born earlier in the year seems not well-grounded and, therefore, here all tests are two-tailed (however, see Gibbs et al., 2015, for a critique on applying inferential statistics on such sort of data).
Furley et al. must be honored for making their dataset publicly available. This is an important step toward transparency in science, thereby hopefully facilitating reproducibility as well as evaluation of data analysis and interpretation pre (e.g., during peer review) and post publication (Drummond and Vowler, 2011; Open Science Collaboration, 2015). Regrettably, the dataset at hand illustrates that single birthdays may mess[i] up your statistics.